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Abstract 


Coronavirus disease (COVID-19), has spread over the world since early 2020. The disease causes respiratory problem with 
manifestations, for example, cold, cough and fever. This viral disease has been declared as Pandemic on January, 2020 by 
International Health Regulations Emergency Committee of the World Health Organisation. India has also faced many 
deaths as an effect of this Corona virus. From early 2021 vaccination has started in India, as a result of vaccination the 
effect of this inflamm. 
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Introduction 


The author have described about several disease outbreaks that invaded humanity in World history. 
World Health Organization (WHO), its co-operating clinicians and various national authorities around 
the globe fight against these pandemics to date.The novel coronavirus appeared in the Wuhan city of 
China was reported to the World Health Organization (W.H.O) [6,7,8,9,10,16] We have a slightly 
different date we might count this from, and the date most scientists will recall is the day they locked 
their lab and went home. Although the history of Machine Learning (ML) dates back to at least the 
1950s, the techniques have seen wide usage only in the last two to three decades. The main reasons 
are recent advances in computing power, increasing availability of open-source software, and 
developments in data capture and database technologies. 


ML Tecniques- 


Atype of artificial intelligence (AI) that allows software applications to become more accurate at 
predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use 
historical data as input to predict new output values. 


Machine Learning 


! 


unsupervised semi supervised 


regression classification 
decision tree 


Fig1: Machine Learning Technique 
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Supervised Learning- 


Supervised learning is based on training a data sample from data source with correct classification 
already assigned. Supervised machine learning techniques are applicable in numerous 
domains[4,5,17].One standard formulation of the supervised learning task is the classification problem. 
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Fig2: Supervised Learning 


Decision tree Algorithm- 


We begin with an overview of decision trees since they are the building blocks of the SML algorithms 
discussed in this section.[3,5,16] There are many tree-based algorithms for classification and regression: 


The algorithm works as follows: 
1. Start from the root note with all the data. 


2. Split each node into two child nodes to minimize some impurity measure (defined later). The best 
split is found by searching through all possible combinations of variables and their split points. 


3. The tree is grown until a stopping rule is reached. Tree size is controlled by several hyper-parameters 
which are selected by hyper-parameter tuning. 


4. Finally, data within each terminal node (or leaf) is used to prediction: node sample mean for 
continuous responses and majority vote or class proportions for binary/categorical responses. 


Decision tree(DT) algorithm comes under Classification model which is from Supervised Learning. 
Decision Trees (DT) are trees that classify instances by sorting them based on feature values. Decision 
tree learning, used in data mining and machine learning, uses a decision tree as a predictive model which 
maps observations about an item to conclusions about the item's target value. 


yes no yes no 


play outside [stay at home| | bonfi re ‘| | “trekking 


Fig3:Decision Tree 
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Unsupervised Learning-Unsupervised learning can be described as the general problem of extracting 
value from unlabelled data which exists in vast quantities. A popular framework for unsupervised 
learning is that of representation learning, whose goal is to use unlabelled data to learn a representation 
that exposes important semantic features as easily decidable factors[12,13,16,18,20].The unsupervised 
learning algorithms learns few features from the data. Unsupervised learning We have a slightly 
different date we might count this from, and the date most scientists will recall is the day they locked 
their lab and went home. 


model 


output 


Fig4: Unsupervised Learning 


Semi supervised Learning- 


Semi-supervised learning considers the problem of classification when only a small subset of the 
observations have corresponding class labels. [13,14,15,16,19,21,22,23]Semi-supervised learning is a 
branch of machine learning that makes use of a small set of labeled data and a large set of unlabeled 
data to improve learning accuracy. The main downfall of this approach, it can’t cluster an unknown 
data accurately. In the active semi-supervised learning (ASSL), the training set consists of unlabeled 
and labeled samples. As aforementioned, since the cost associated with the sample annotation process 
is high (and it can require the opinion of one or more specialists), the smallest possible set of samples 
should be labelled. 


supervised learning 


labeled data 


semi-supervised learning 
unsupervised learning 


unlabeled data 


Fig 5:Semi Supervised Learning 


Literature Review- 


Author says, [1]The following preprocessing steps are applied to the data to achieve better ,accuracy, 
efficiency and scalability of the classification process:- 


1) Data cleaning: This refers to the preprocessing of data with treatment of missing values[1](by 
replacing missing value with most commonly occurring value using pandas library). 
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2) Data Transformation and Reduction: Sometimes the dataset ,may be required to be 
transformed or added with other datasets. 


Relevance analysis: 


Author says, [1,2,3]The dataset may contain redundant attributes. The techniques like correlation 
analysis can be used to find out if any two attributes are statistically related. As an example ,the high 
correlation between attribute Al and A2 ,would result in removal of one of the attribute. Another 
relevance analysis is Attribute subset selection that finds a reduced set of attributes , such that the 
attained probability distribution of data classes is as near as possible to the original probability 
distribution using all attributes. This is how we detect attributes that do not contribute to classification. 
A comprehensive review is performed for the latest and most efficient approaches that have been 
performed by researchers in the past three years about decision trees in different areas of machine 
learning. Also, the details of this method, such as using algorithms/approaches, datasets, and the 
findings achieved are summarized. In addition, this study highlighted the most commonly used 
approaches and the highest accuracy methods achieved. All supervised machine learning algorithms are 
based on a predefined set of labels , and a training set comprised of articles which have been assigned 
one or more labels 


Model Training: 


In this phase we process the training set and we construct a classification model . This procedure 
includes three stages where we correlate keywords, authors and journals to one or more labels. We also 
record several frequency values which will be used later by the classification algorithm to effectively 
determine the labels of the unclassified papers. The majority of the research articles includes a set of 
keywords placed between the abstract and the first section. 
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also good at 
recognizing spam. 
2.Machine 
Learning can 
review large 
volumes of data 
and discover 
specific trends and 
patterns that would 
not be apparent to 
humans. For 
instance, for an e- 
commerce website 
like Amazon, 
itserves to 
understand the 
browsing behaviors 
and purchase 
histories of its users 
to help cater to the 
right products, 
deals, and 
reminders relevant 
to them. It uses the 
results to reveal 
relevant 
advertisements to 
them. 


algorithms learn 
and develop 
enough to fulfill 
their purpose 
with a 
considerable 
amount of 
accuracy and 
relevancy. It also 
needs massive 
resources to 
function. This 
can mean 
additional 
requirements of 
computer power 
for you. 
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trainalgorithms that to 
classify data or predict 


outcomes accurately. 
As input data is fed 
into the model, it 


adjusts its weights until 


the model has been 
fitted appropriately, 


which occurs as part of 


the cross validation 
process. 


1.The use of well- 
known and labelled 
input data makes 
supervised learning 
produce a far more 


accurate and 
reliable than 
unsupervised 


learning. With the 
access to labels, it 
can use to improve 
its performance on 
some task. 


2.Efficient in 
finding solutions to 
several linear and 
non-linear 
problems such as 
classification, 
robotics, prediction 
and factory control. 
Able to — solve 
complex problem 
by having hidden 
neuron layer 


1.Performs 
poorly when 
there are non- 
linear 


relationships. 
One of 
supervised 
learning method 
like linear 
regression not 
flexible to 
apprehend more 
complex 
structure. It takes 
a lot of 
computation 
time and _ also 
difficult to 


append the right 
polynomials — or 
interaction 
terms. 

2.Takes a long 
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compute by 
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supervised 
learning can 
grow in 
complexity. 
Therefore, it is 


not giving result 
in real time since 


majority of 
world’s data is 
unlabelled, the 


performance is 
quite limited. 
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formula like 
supervisedlearning 
2.Is one of the 
nearest to thetype 
of learning that 
humans and 
mammals do. In 
fact, majority of the 
fundamental 
algorithm of RL are 
derived from 
human brain and 
neurological 
system 


and also 
stranded at local 
optima. 

2.Need a lot of 
training data and 
need some time 
to train to be 
more accurate 
and efficient 
compared to 
other learning 
algorithm. 


STEPS TO BE FOLLOWED- 
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Fig6:Steps To Be Followed 


PROPOSED ALGORITHM- 


Total case (T test): It defines the total number of Tests that include Delta (Tpeta) and (Téettas}that have 
occurred in one month for Covid19 ina specific state. 


91 


: start 


: determine class to make decision 


2 ifPratio>=0.1 then // Pratioin equation (iii) 


Covid_status:=high 


Step 5 : if Pratio<O.1 &&Pratio>=0.5&&Uratio>.05 then 


Covid_status:= moderate 


Step 6 : if Pratio<O0.5 && Uratio<.05 then 


Covid_status:= low 


: check the parameters(real values) of the attributes 
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Fig7:Decision Tree 


DATASET “Here we have fetched datasetsof North-East states of India-Assam,Nagaland,Odisha,Tripura and West Bengal. As an example from all the 


taken states in Table 1,dataset of Tripura is shown- 


Sees 

fae Numeric 
A.o| 77.9] _a.00652610410.00004..Jow 
7511.0) _0.006646156| 0.04069...low 
Ma |27at230.0) 680370 2450930. g--| 9299.0) — 0.00302546| 0.05159. maderate 
APR | __6263584.0| _201636.0 oleosies 0 ore: | sso] 9anLRi O06 dete 


8. 10027E-4) 0.10390.. 


026019667) 10471670] 197150. 0.S86314..|9029.0) 3.960936 40.10555..high | 
~{275074.0)2 37i04./0.956560..| 6289.0) 3 31S77E-410.10202._figh 
1486855.0] 2.68980...| 0.947559... 6993.0 _2.463496-4[0.10488...high 


Table1: Dataset of Tripura 


Performance Analysis- 


We have used WEKA to show the performance- 
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@ Weka Explorer a x 
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Fig8:The preprocess tab of Weka is showing the status of Covid19 with parameters of 
high,moderate and low. 


© Weka Explorer = O x 
Preprocess Classify Cluster Associate Select attributes Visualize 
Classifier 


Choose ZeroR 


Test options Classifier output 
O Use training set Kappa statistic 0 a 
CO Supplied test set Set. Mean absolute error 0.3954 
Root méan squared error 0.4474 
@ Cross-validation Folds 10 Relative absolute error 100 $ 
CO Percentage split % |66 Root relative squared error 100 = 
Total Number of Instances 13 
More options... | 
| === Detailed Accuracy By Class === 
(Nom) decission v 
TP Rate FP Rate Precision Recall F-Measure MCC 
Start Stop 0.000 0.000 ? 0.000 ? 2 
Result list (right-dick for options) 1.000 1.000 0.615 1.000 0.762 2 
12:20:10 - rules.ZeroR o.000 9.680 i iabeioks : © 
Weighted Avg. 0.615 0.615 ? 0.615 2? 2 
| ' 
| === Confusion Matrix === b 
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080 1b= high 
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Fig9:classified as low,moderate,high 
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Fig10:Observation of the dataset of Tripura 


Conclusion- 


The covid-19 epidemic has presented many countries in the world with an unprecedented public 
health crisis now days. The effect of covid-19 has significant role in world economy. Using Machine 
learning, a model can be invented to predict and analyze the effect of epidemic in public health with 
the course of time. Information and communication technology support in the decision-making 
process based on the previously collected data. As the size of the collected data is huge that make it 
difficult 
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